library(AER)
library(dplyr)
library(DBI)
library(readxl)
library(knitr)
library(xtable)
library(dbplyr)
library(TTR)
library(fpp2)
library(tsbox)
library(zoo)
library(lubridate)
library(forecast)
library(ggplot2)
library(tseries)
library(dygraphs)
library(stats)
library(vars)
library(gbm)
library(Hmisc)
library(DT)
library(xts)
library(zoo)Import the datasets
alarms<- read.csv('allarmi_30147127.csv')
pressures<- read.csv("p_30147127.csv")Let’s start with an exploratory data analysis. It’s always fundamental to have a deep understanding of the datasets.
datatable(head(pressures, 100), options = list(
columnDefs = list(list(className = 'dt-center')),
pageLength = 5
))The pressure’s dataset contains a column named “ts”, which contains a date and a time, and a “value” column, with values of the pressure, round by 2 significant digits.
datatable(head(alarms, 100), options = list(
columnDefs = list(list(className = 'dt-center')),
pageLength = 5
))The alarm dataset, on the other hand, contains a column “date_time”, with a date and a time, and one named “priorita”, with the priority of the alarm occurred.
str(alarms)## 'data.frame': 171 obs. of 2 variables:
## $ date_time: chr "2018-04-10 07:40:42" "2018-04-10 07:42:23" "2018-11-13 09:22:19" "2018-11-14 10:03:21" ...
## $ priorita : chr "alta" "bassa" "bassa" "bassa" ...
str(pressures)## 'data.frame': 20625 obs. of 2 variables:
## $ ts : chr "2018-03-18 00:00:00" "2018-03-18 00:15:00" "2018-03-18 00:30:00" "2018-03-18 00:45:00" ...
## $ value: num 23.7 23.7 23.7 23.7 23.7 ...
unique(alarms$priorita)## [1] "alta" "bassa" "media"
The values of “priorita” are only three. The “date_time” and “ts” columns of the two dataset are in a character format. It’s useful to convert them into POSIXct. Furthermore, the “priorita” is not usable in a character format, so it will be converted into a numeric type.
alarms$date_time = ymd_hms(alarms$date_time)
alarms$priorita = recode(alarms$priorita, "bassa"=1, "media"=2, "alta"=3)
pressures$ts = ymd_hms(pressures$ts)summary(pressures)## ts value
## Min. :2018-03-18 00:00:00 Min. : 0.04514
## 1st Qu.:2018-07-03 11:00:00 1st Qu.:23.34588
## Median :2018-08-17 08:00:00 Median :23.60788
## Mean :2018-09-15 10:54:27 Mean :23.60724
## 3rd Qu.:2018-12-25 03:00:00 3rd Qu.:23.86989
## Max. :2019-04-17 23:00:00 Max. :26.19898
summary(alarms)## date_time priorita
## Min. :2018-04-10 07:40:42 Min. :1.000
## 1st Qu.:2018-12-28 09:18:05 1st Qu.:2.000
## Median :2018-12-31 08:53:15 Median :2.000
## Mean :2018-12-28 11:48:51 Mean :2.205
## 3rd Qu.:2019-01-08 10:39:05 3rd Qu.:3.000
## Max. :2019-01-08 10:41:04 Max. :3.000
For the priority’s encoding has been used the numbers 1,2,3 so that the 0 will represent, in the merged dataset, the condition with no alarm occurred, which is obviously present because of the different size of the datasets. It can be useful to plot the pressures dataset…
plot(pressures$ts, pressures$value, type="l")… and to take a look at the summary statistics
summary(pressures)## ts value
## Min. :2018-03-18 00:00:00 Min. : 0.04514
## 1st Qu.:2018-07-03 11:00:00 1st Qu.:23.34588
## Median :2018-08-17 08:00:00 Median :23.60788
## Mean :2018-09-15 10:54:27 Mean :23.60724
## 3rd Qu.:2018-12-25 03:00:00 3rd Qu.:23.86989
## Max. :2019-04-17 23:00:00 Max. :26.19898
summary(alarms)## date_time priorita
## Min. :2018-04-10 07:40:42 Min. :1.000
## 1st Qu.:2018-12-28 09:18:05 1st Qu.:2.000
## Median :2018-12-31 08:53:15 Median :2.000
## Mean :2018-12-28 11:48:51 Mean :2.205
## 3rd Qu.:2019-01-08 10:39:05 3rd Qu.:3.000
## Max. :2019-01-08 10:41:04 Max. :3.000
By checking the summary statistics and the plot seems clear that there’s an outlier, a very small value. It’s useful to check whether this value correspond to an alarm
data = pressures[pressures$value==min(pressures$value),1]
data## [1] "2018-04-10 07:42:04 UTC"
alarms[format(alarms$date_time, "%Y-%m-%d %h")==format(data, "%Y-%m-%d %h"),]| date_time | priorita |
|---|---|
| 2018-04-10 07:40:42 | 3 |
| 2018-04-10 07:42:23 | 1 |
There is an alarm in correspondence of this value, so it’s possible that this value has been actually verified.
Check duplicates
alarms[duplicated(alarms),]| date_time | priorita |
|---|
pressures[duplicated(pressures),]| ts | value |
|---|
There are no duplicates in the dataset
Is usually useful to check the missing values of the dataset
library(naniar)
alarms %>%
miss_var_summary()| variable | n_miss | pct_miss |
|---|---|---|
| date_time | 0 | 0 |
| priorita | 0 | 0 |
pressures %>%
miss_var_summary()| variable | n_miss | pct_miss |
|---|---|---|
| ts | 0 | 0 |
| value | 0 | 0 |
In these datasets there are no missing values.
It’s necessary to group the alarms’ observations into intervals of 15 minutes. The “priorita” column will be summarized with a “max” function, since in a period of 15 minutes the maximum priority will be taken as the main one.
alarms2 = alarms %>%
mutate(date_time = floor_date(alarms$date_time, unit="15 mins")) %>%
group_by(date_time) %>%
summarise(priorita=max(priorita))Now it’s possible to merge the two dataset. The merge is an outer one, so that we will keep all the pressure’s values and the alarms’ priority. Then, the resulting NA’s in the “priorita” column will be replaced with 0s, since the absence of an alarm can be indexed with such value.
df = merge(alarms2, pressures, by.x="date_time", by.y="ts", all=T)
df$priorita[is.na(df$priorita)] = 0
summary(df)## date_time priorita value
## Min. :2018-03-18 00:00:00 Min. :0.000000 Min. : 0.04514
## 1st Qu.:2018-07-03 12:06:29 1st Qu.:0.000000 1st Qu.:23.34588
## Median :2018-08-17 08:13:15 Median :0.000000 Median :23.60788
## Mean :2018-09-15 11:35:17 Mean :0.001357 Mean :23.60724
## 3rd Qu.:2018-12-25 05:45:00 3rd Qu.:0.000000 3rd Qu.:23.86989
## Max. :2019-04-17 23:00:00 Max. :3.000000 Max. :26.19898
## NA's :9
There are some values of the pressure that are NA’s. It seems necessary to replace them. One idea can be to compute the mean pressure of “alta”, “media” and “bassa” value of priority, and replace them with this criteria. I’ve tried this but it’s not effective, so a na.locf procedure will be implemented.
#m1 = mean(df$value[(df$priorita==1)&(!is.na(df$value))])
#m2 = mean(df$value[(df$priorita==2)&(!is.na(df$value))])
#m3 = mean(df$value[(df$priorita==3)&(!is.na(df$value))])
#m1
#m2
#m3
#df$value[(df$priorita==1)&(is.na(df$value))] = m1
#df$value[(df$priorita==2)&(is.na(df$value))] = m2
#df$value[(df$priorita==3)&(is.na(df$value))] = m3df = na.locf(df)The creation of xts and ts class will be useful for the following operations.
priorita_xts=xts(x=df$priorita, order.by=df$date_time)
value_xts=xts(x=df$value, order.by=df$date_time)
priorita_ts=as.ts(x=df$priorita, start=df$date_time[1])
value_ts=as.ts(x=df$value, start=df$date_time[1])Regularity and stationarity are two of the main characteristics of the time series.
is.regular(priorita_xts)## [1] TRUE
is.regular(value_xts)## [1] TRUE
is.regular(priorita_xts, strict=T)## [1] FALSE
is.regular(value_xts, strict=T)## [1] FALSE
The time series is regular but not strictly. In fact, there is a regular underlying structure but the values are not equally spaced.
adf.test(priorita_ts, alternative="stationary", k=0)##
## Augmented Dickey-Fuller Test
##
## data: priorita_ts
## Dickey-Fuller = -143.73, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
adf.test(value_ts, alternative="stationary", k=0)##
## Augmented Dickey-Fuller Test
##
## data: value_ts
## Dickey-Fuller = -26.631, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
The Dickey-Fuller reveals they’re both stationary, with a small p-value.
Setting the best frequency is essential for an effective forecasting.
frequency(value_xts)## [1] 1
frequency(priorita_xts)## [1] 1
periodicity(value_xts)## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
periodicity(priorita_xts)## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
The observed data have a 15 minutes periodicity, and there are different possible choices. A frequency of 96 (4*24) means a daily frequency
attr(value_xts, "frequency") = 4
attr(priorita_xts, "frequency") = 4
periodicity(value_xts)## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
periodicity(priorita_xts)## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
There seems to be no need to filter the dataset
acf(value_xts)acf(priorita_xts)pacf(value_xts)pacf(priorita_xts)The first model that can be used to forecast time series is an ARIMA model. To do so, it’s necessary to consider the univariate time series composed of pressure values.
The first step is to split of the dataset into two subsets: a train set and a validation set
train = value_xts[index(value_xts) <= "2019-04-14"]
test = value_xts[index(value_xts) > "2019-04-14"]Searching for the best ARIMA model according to the training set
arima <- auto.arima(train, trace= T)##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2)(1,0,1)[4] with drift : -13407.69
## ARIMA(0,1,0) with drift : -10306.43
## ARIMA(1,1,0)(1,0,0)[4] with drift : -11132.52
## ARIMA(0,1,1)(0,0,1)[4] with drift : -11066.75
## ARIMA(0,1,0) : -10308.43
## ARIMA(2,1,2)(0,0,1)[4] with drift : Inf
## ARIMA(2,1,2)(1,0,0)[4] with drift : Inf
## ARIMA(2,1,2)(2,0,1)[4] with drift : Inf
## ARIMA(2,1,2)(1,0,2)[4] with drift : Inf
## ARIMA(2,1,2) with drift : Inf
## ARIMA(2,1,2)(0,0,2)[4] with drift : Inf
## ARIMA(2,1,2)(2,0,0)[4] with drift : Inf
## ARIMA(2,1,2)(2,0,2)[4] with drift : -13449.21
## ARIMA(1,1,2)(2,0,2)[4] with drift : Inf
## ARIMA(2,1,1)(2,0,2)[4] with drift : Inf
## ARIMA(3,1,2)(2,0,2)[4] with drift : -13705.36
## ARIMA(3,1,2)(1,0,2)[4] with drift : Inf
## ARIMA(3,1,2)(2,0,1)[4] with drift : Inf
## ARIMA(3,1,2)(1,0,1)[4] with drift : -13587.17
## ARIMA(3,1,1)(2,0,2)[4] with drift : -13481
## ARIMA(3,1,3)(2,0,2)[4] with drift : -13195.95
## ARIMA(2,1,3)(2,0,2)[4] with drift : -13101.3
## ARIMA(3,1,2)(2,0,2)[4] : -13707.36
## ARIMA(3,1,2)(1,0,2)[4] : Inf
## ARIMA(3,1,2)(2,0,1)[4] : Inf
## ARIMA(3,1,2)(1,0,1)[4] : -13589.15
## ARIMA(2,1,2)(2,0,2)[4] : Inf
## ARIMA(3,1,1)(2,0,2)[4] : -13482.96
## ARIMA(3,1,3)(2,0,2)[4] : -13197.86
## ARIMA(2,1,1)(2,0,2)[4] : Inf
## ARIMA(2,1,3)(2,0,2)[4] : -13098.58
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(3,1,2)(2,0,2)[4] : -13720.79
##
## Best model: ARIMA(3,1,2)(2,0,2)[4]
Fitting the model
fit_arima<- xts(arima$fitted, order.by = index(train))Visually checking the results of the fitting
confronto_arima <- cbind(fit_arima, train)
dygraph(confronto_arima) %>% dyRangeSelector()Checking the results based on the measures
accuracy(arima)## ME RMSE MAE MPE MAPE MASE
## Training set 9.283695e-05 0.1727744 0.02977672 -0.3902496 0.5797132 0.4910058
## ACF1
## Training set -0.007031916
# pseudo R2
cor(fitted(arima),train)^2## [,1]
## [1,] 0.9017884
Applying the ARIMA model found to the validation set
predizione_arima <- predict(arima, nrow(test), prediction.interval = TRUE)
predizione_arima## $pred
## Qtr1 Qtr2 Qtr3 Qtr4
## 5101 23.97979 23.98836
## 5102 23.99083 23.99679 24.00395 24.00735
## 5103 24.01043 24.00991 24.01005 24.00914
## 5104 24.00810 24.00687 24.00577 24.00493
## 5105 24.00439 24.00417 24.00419 24.00440
## 5106 24.00471 24.00506 24.00537 24.00562
## 5107 24.00579 24.00587 24.00587 24.00582
## 5108 24.00573 24.00563 24.00554 24.00546
## 5109 24.00541 24.00538 24.00538 24.00539
## 5110 24.00542 24.00545 24.00547 24.00550
## 5111 24.00551 24.00552 24.00552 24.00552
## 5112 24.00551 24.00550 24.00550 24.00549
## 5113 24.00548 24.00548 24.00548 24.00548
## 5114 24.00548 24.00549 24.00549 24.00549
## 5115 24.00549 24.00549 24.00549 24.00549
## 5116 24.00549 24.00549 24.00549 24.00549
## 5117 24.00549 24.00549 24.00549 24.00549
## 5118 24.00549 24.00549 24.00549 24.00549
## 5119 24.00549 24.00549 24.00549 24.00549
## 5120 24.00549 24.00549 24.00549 24.00549
## 5121 24.00549 24.00549 24.00549 24.00549
## 5122 24.00549 24.00549 24.00549 24.00549
## 5123 24.00549 24.00549 24.00549 24.00549
## 5124 24.00549 24.00549 24.00549 24.00549
## 5125 24.00549 24.00549 24.00549 24.00549
## 5126 24.00549 24.00549 24.00549 24.00549
## 5127 24.00549 24.00549 24.00549 24.00549
## 5128 24.00549 24.00549 24.00549 24.00549
## 5129 24.00549 24.00549 24.00549 24.00549
## 5130 24.00549 24.00549 24.00549 24.00549
## 5131 24.00549 24.00549 24.00549 24.00549
## 5132 24.00549 24.00549 24.00549 24.00549
## 5133 24.00549 24.00549 24.00549 24.00549
## 5134 24.00549 24.00549 24.00549 24.00549
## 5135 24.00549 24.00549 24.00549 24.00549
## 5136 24.00549 24.00549 24.00549 24.00549
## 5137 24.00549 24.00549 24.00549 24.00549
## 5138 24.00549 24.00549 24.00549 24.00549
## 5139 24.00549 24.00549 24.00549 24.00549
## 5140 24.00549 24.00549 24.00549 24.00549
## 5141 24.00549 24.00549 24.00549 24.00549
## 5142 24.00549 24.00549 24.00549 24.00549
## 5143 24.00549 24.00549 24.00549 24.00549
## 5144 24.00549 24.00549 24.00549 24.00549
## 5145 24.00549 24.00549 24.00549 24.00549
## 5146 24.00549 24.00549 24.00549 24.00549
## 5147 24.00549 24.00549 24.00549 24.00549
## 5148 24.00549 24.00549 24.00549 24.00549
## 5149 24.00549 24.00549 24.00549 24.00549
## 5150 24.00549 24.00549 24.00549 24.00549
## 5151 24.00549 24.00549 24.00549 24.00549
## 5152 24.00549 24.00549 24.00549 24.00549
## 5153 24.00549 24.00549 24.00549 24.00549
## 5154 24.00549 24.00549 24.00549 24.00549
## 5155 24.00549 24.00549 24.00549 24.00549
## 5156 24.00549 24.00549 24.00549 24.00549
## 5157 24.00549 24.00549 24.00549 24.00549
## 5158 24.00549 24.00549 24.00549 24.00549
## 5159 24.00549 24.00549
##
## $se
## Qtr1 Qtr2 Qtr3 Qtr4
## 5101 0.1728168 0.2523054
## 5102 0.3176417 0.3596635 0.3979347 0.4183203
## 5103 0.4277992 0.4307304 0.4333349 0.4341933
## 5104 0.4345410 0.4346872 0.4348128 0.4350048
## 5105 0.4353660 0.4359993 0.4369502 0.4382334
## 5106 0.4397800 0.4414842 0.4432295 0.4449220
## 5107 0.4465022 0.4479468 0.4492610 0.4504674
## 5108 0.4515967 0.4526805 0.4537470 0.4548187
## 5109 0.4559109 0.4570317 0.4581829 0.4593608
## 5110 0.4605583 0.4617667 0.4629776 0.4641839
## 5111 0.4653807 0.4665653 0.4677372 0.4688974
## 5112 0.4700481 0.4711916 0.4723305 0.4734667
## 5113 0.4746019 0.4757368 0.4768718 0.4780066
## 5114 0.4791408 0.4802735 0.4814043 0.4825324
## 5115 0.4836575 0.4847793 0.4858978 0.4870129
## 5116 0.4881249 0.4892340 0.4903403 0.4914441
## 5117 0.4925455 0.4936447 0.4947417 0.4958364
## 5118 0.4969289 0.4980191 0.4991070 0.5001926
## 5119 0.5012758 0.5023566 0.5034350 0.5045110
## 5120 0.5055848 0.5066562 0.5077253 0.5087921
## 5121 0.5098568 0.5109192 0.5119794 0.5130375
## 5122 0.5140934 0.5151472 0.5161988 0.5172482
## 5123 0.5182956 0.5193408 0.5203839 0.5214250
## 5124 0.5224639 0.5235008 0.5245357 0.5255685
## 5125 0.5265992 0.5276280 0.5286548 0.5296795
## 5126 0.5307023 0.5317232 0.5327420 0.5337590
## 5127 0.5347740 0.5357870 0.5367982 0.5378075
## 5128 0.5388148 0.5398203 0.5408239 0.5418257
## 5129 0.5428256 0.5438237 0.5448199 0.5458143
## 5130 0.5468069 0.5477978 0.5487868 0.5497740
## 5131 0.5507595 0.5517432 0.5527252 0.5537054
## 5132 0.5546839 0.5556607 0.5566358 0.5576091
## 5133 0.5585808 0.5595508 0.5605190 0.5614857
## 5134 0.5624506 0.5634139 0.5643756 0.5653356
## 5135 0.5662940 0.5672508 0.5682060 0.5691595
## 5136 0.5701115 0.5710619 0.5720107 0.5729579
## 5137 0.5739036 0.5748477 0.5757903 0.5767313
## 5138 0.5776708 0.5786088 0.5795452 0.5804802
## 5139 0.5814136 0.5823456 0.5832760 0.5842050
## 5140 0.5851325 0.5860585 0.5869831 0.5879062
## 5141 0.5888278 0.5897481 0.5906669 0.5915842
## 5142 0.5925002 0.5934147 0.5943278 0.5952395
## 5143 0.5961499 0.5970588 0.5979664 0.5988725
## 5144 0.5997774 0.6006808 0.6015829 0.6024836
## 5145 0.6033830 0.6042811 0.6051778 0.6060732
## 5146 0.6069673 0.6078601 0.6087515 0.6096417
## 5147 0.6105305 0.6114181 0.6123044 0.6131894
## 5148 0.6140731 0.6149555 0.6158367 0.6167166
## 5149 0.6175953 0.6184727 0.6193489 0.6202238
## 5150 0.6210976 0.6219700 0.6228413 0.6237113
## 5151 0.6245802 0.6254478 0.6263142 0.6271794
## 5152 0.6280435 0.6289063 0.6297680 0.6306285
## 5153 0.6314878 0.6323459 0.6332029 0.6340587
## 5154 0.6349134 0.6357669 0.6366193 0.6374705
## 5155 0.6383207 0.6391696 0.6400175 0.6408642
## 5156 0.6417098 0.6425543 0.6433977 0.6442400
## 5157 0.6450811 0.6459212 0.6467602 0.6475981
## 5158 0.6484350 0.6492707 0.6501054 0.6509390
## 5159 0.6517715 0.6526030
By transforming such prediction into an xts object and combining with the actual values, it’s possible to plot the prediction against the actual values
# extract prediction values
predizione_arima_xts <- xts(predizione_arima$pred, order.by = index(test))
# combine original values + training + prediction
modello_arima <- cbind(predizione_arima_xts, test, train)Plot
dygraph(modello_arima) %>% dyRangeSelector() %>%
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))Furthermore it’s useful to check the accuracy measures of the forecast
accuracy(predizione_arima$pred , test)## ME RMSE MAE MPE MAPE
## Test set -0.01770934 0.0597344 0.04656612 -0.07439064 0.1943499
end(test)## [1] "2019-04-17 23:00:00 UTC"
The last date is 17-04-2019 at 23:00:00. The forecasting can be of the next 12 hours.
## Next time index
time_index <- seq(from = as.POSIXct("2019-04-17 23:00:00"), to = as.POSIXct("2019-04-18 11:00:00"), by="15 min")
# find number period
index(time_index)## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
forecast = forecast(arima, length(time_index))
forecast_xts<- xts(forecast$mean, order.by = time_index)Finally, the plot of the forecasting
predizione<- cbind(forecast_xts, value_xts)
dygraph(predizione) %>% dyRangeSelector() %>%
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))The usual approach is to use Johansen’s method for testing whether or not cointegration exists. If the answer is “yes” then a vector error correction model (VECM), which combines levels and differences, can be estimated instead of a VAR in levels.
train = as.ts(df[df$date_time <= "2019-04-14",c(2,3)])
test = as.ts(df[df$date_time > "2019-04-14",c(2,3)])
train2 = xts(df[df$date_time <= "2019-04-14",], order.by=df$date_time[df$date_time <= "2019-04-14"])
test2 = xts(df[df$date_time > "2019-04-14",], order.by=df$date_time[df$date_time > "2019-04-14"])po.test(train)##
## Phillips-Ouliaris Cointegration Test
##
## data: train
## Phillips-Ouliaris demeaned = -26178, Truncation lag parameter = 204,
## p-value = 0.01
The Phillips-Ouliaris cointegration test reveals that the series are correlated. It was obvious, since one series is highly correlated to the other one.
df_ts = as.ts(df[,c(2,3)])
var_select<- VARselect(df_ts, lag.max = 10, type = "both")cointest <- ca.jo(train,K=var_select$selection[1],type = "eigen", ecdet = "const", spec = "transitory")
summary(cointest)##
## ######################
## # Johansen-Procedure #
## ######################
##
## Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration
##
## Eigenvalues (lambda):
## [1] 8.785177e-02 2.196131e-02 6.152169e-21
##
## Values of teststatistic and critical values of test:
##
## test 10pct 5pct 1pct
## r <= 1 | 452.83 7.52 9.24 12.97
## r = 0 | 1875.10 13.75 15.67 20.20
##
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
##
## priorita.l1 value.l1 constant
## priorita.l1 1.000000000 1.0000000 1.000000
## value.l1 -0.004417625 0.6210115 3.855939
## constant 0.102896872 -14.6593179 2634.952726
##
## Weights W:
## (This is the loading matrix)
##
## priorita.l1 value.l1 constant
## priorita.d -0.8985978 -0.002474699 1.180535e-21
## value.d 0.1762669 -0.082125830 3.637673e-20
The Johansen’s procedure highlights that the rank of the matrix is 2, since the hypothesis of rank=0 and rank<=1 are rejected.
cointest <- ca.jo(train,K=var_select$selection[1],type = "eigen", ecdet = "const", spec = "transitory")
vecm <- cajorls(cointest, r=1)
#Transform VECM to VAR
var <- vec2var(cointest)# Predict
forecast_dfb <- predict(var, n.ahead=232) # VECM
#forecast_LT2b$fcst$xts_LT2[,1] # predictes values
# xts and ts class
predizione_dfb <- xts(forecast_dfb$fcst$value[,1] , order.by = index(test2))
predizione_dfbts <- ts(predizione_dfb)
# combine training and predicted values
pred_dfb <- cbind(predizione_dfb, test2$value, train2$value)accuracy(predizione_dfbts, as.numeric(test2$value))## ME RMSE MAE MPE MAPE
## Test set 0.008751104 0.05710784 0.04692652 0.03592482 0.1956478
dygraph(pred_dfb) %>% dyRangeSelector() %>%
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))time_index <- seq(from = as.POSIXct("2019-04-17 23:00:00"), to = as.POSIXct("2019-04-18 11:00:00"), by="15 min")
index(time_index)## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
forecast <- predict(var,n.ahead=length(time_index))
predizione_p_xts <- xts(forecast$fcst$value[,1] , order.by = time_index)
df_xts = xts(df, order.by=df$date_time)
predizione <- cbind(predizione_p_xts,df_xts$value)
dygraph(predizione) %>% dyRangeSelector() %>%
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))Accuracy measures of ARIMA(3,1,2)(2,0,2)[4]: ME RMSE MAE MPE MAPE Test set -0.01770934 0.0597344 0.04656612 -0.07439064 0.1943499
Accuracy measures of VECM: ME RMSE MAE MPE MAPE Test set 0.008751104 0.05710784 0.04692652 0.03592482 0.1956478
So, to summarize, the two models are quite similar in performances.
df[df$priorita==1,3]## [1] 23.69148 23.69148
df[df$priorita==2,3]## [1] 23.70736
df[df$priorita==3,3]## [1] 23.49000 24.01327 24.34019 24.22577 26.16302 23.80077 24.02588 24.20522
With the exception of one single value, the thresholds seems to be:
“bassa” -> 23.6 “media” -> 23.7 “alta” -> 23.8
colnames(predizione_p_xts) = "Pressione"
predizione_p_xts$Stato = ifelse(predizione_p_xts$Pressione<23.6, "normale", ifelse(predizione_p_xts$Pressione<23.7, "bassa", ifelse(predizione_p_xts$Pressione<23.8, "media", "alta")))
predizione_p_xts## Pressione Stato
## 2019-04-17 23:00:00 23.98973 NA
## 2019-04-17 23:15:00 23.99059 NA
## 2019-04-17 23:30:00 23.99202 NA
## 2019-04-17 23:45:00 23.99023 NA
## 2019-04-18 00:00:00 23.98902 NA
## 2019-04-18 00:15:00 23.98549 NA
## 2019-04-18 00:30:00 23.97818 NA
## 2019-04-18 00:45:00 23.97554 NA
## 2019-04-18 01:00:00 23.97306 NA
## 2019-04-18 01:15:00 23.97337 NA
## 2019-04-18 01:30:00 23.97365 NA
## 2019-04-18 01:45:00 23.97558 NA
## 2019-04-18 02:00:00 23.97840 NA
## 2019-04-18 02:15:00 23.97987 NA
## 2019-04-18 02:30:00 23.98095 NA
## 2019-04-18 02:45:00 23.98099 NA
## 2019-04-18 03:00:00 23.98085 NA
## 2019-04-18 03:15:00 23.97991 NA
## 2019-04-18 03:30:00 23.97877 NA
## 2019-04-18 03:45:00 23.97804 NA
## 2019-04-18 04:00:00 23.97756 NA
## 2019-04-18 04:15:00 23.97749 NA
## 2019-04-18 04:30:00 23.97758 NA
## 2019-04-18 04:45:00 23.97801 NA
## 2019-04-18 05:00:00 23.97849 NA
## 2019-04-18 05:15:00 23.97885 NA
## 2019-04-18 05:30:00 23.97907 NA
## 2019-04-18 05:45:00 23.97912 NA
## 2019-04-18 06:00:00 23.97907 NA
## 2019-04-18 06:15:00 23.97888 NA
## 2019-04-18 06:30:00 23.97867 NA
## 2019-04-18 06:45:00 23.97850 NA
## 2019-04-18 07:00:00 23.97840 NA
## 2019-04-18 07:15:00 23.97836 NA
## 2019-04-18 07:30:00 23.97839 NA
## 2019-04-18 07:45:00 23.97847 NA
## 2019-04-18 08:00:00 23.97857 NA
## 2019-04-18 08:15:00 23.97865 NA
## 2019-04-18 08:30:00 23.97870 NA
## 2019-04-18 08:45:00 23.97871 NA
## 2019-04-18 09:00:00 23.97870 NA
## 2019-04-18 09:15:00 23.97866 NA
## 2019-04-18 09:30:00 23.97862 NA
## 2019-04-18 09:45:00 23.97859 NA
## 2019-04-18 10:00:00 23.97856 NA
## 2019-04-18 10:15:00 23.97855 NA
## 2019-04-18 10:30:00 23.97856 NA
## 2019-04-18 10:45:00 23.97857 NA
## 2019-04-18 11:00:00 23.97859 NA